Recent and current developments in handling Markush structures from chemical patents

نویسندگان

  • John M. Barnard
  • Geoffrey M. Downs
چکیده

The commercially-available database systems for storing and searching Markush structures from chemical patents have undergone little change since their launch some twenty years ago. However, the past few years have seen the area become an active one again for research and development. This presentation offers an overview and commentary on recent and current activity, and discusses the prospects for improved access to structural information in the patent literature [1]. The existing curated Markush databases remain the gold standard, though several groups, both academic and commercial, continue to work on automatic analysis of full-text patents. This has involved not only the identification of specific-structure nomenclature and its conversion to structure-searchable records, but also the attempted reconstruction of searchable representations of complete Markush structures. The advantages and disadvantages of these approaches, and the prospects for their successful commercial exploitation, are discussed. New commercial software for searching Markush structure databases is being developed by several groups. These employ both conventional substructure search approaches (e.g. ChemAxon, Digital Chemistry), and novel algorithms, in some cases based on various forms of similarity and approximate structure matching (e.g. DecrIPt, IBM). These approaches are summarised and compared, and the opportunity their in-house implementation provides for integration of chemical patent information with the drug-discovery process is discussed. Practitioners have long been aware of the inadequacies and complexities of the existing systems, and the extent to which a new generation of systems may satisfy their requirements is discussed. The possible role of systematic evaluation of retrieval performance (in particular, the TREC-CHEM project) is addressed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Representation of Markush structures: from molecules toward patents

Cheminformatics systems usually focus primarily on handling specific molecules and reactions. However, Markush structures are also indispensable in various areas, like combinatorial library design or chemical patent applications for the description of compound classes. The presentation will discuss how an existing molecule drawing tool (Marvin) and chemical database engine (JChem Base/Cartridge...

متن کامل

Improved chemical text mining of patents using infinite dictionaries, translation and automatic spelling correction

The text mining of patents and patent applications for chemical structures of interest to medicinal chemists poses a number of unique challenges not encountered in other fields of text analytics. Traditional text mining relies on the co-occurrence of common terms between documents to provide similarity measures that can be used to cluster and rank related documents. The more words shared betwee...

متن کامل

Recognition of chemical entities in patents using LeadMine

LeadMine is a dictionary/grammar based approach to entity recognition. For chemical entities, hand-written grammars are used to recognize systematic chemical names and formulae. Trivial names are found using dictionaries, some derived from public sources and some hand curated. A rule-based method is used to detect abbreviations of identified entities. To improve the system’s performance on pate...

متن کامل

Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis

This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985–2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and fle...

متن کامل

Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents

BACKGROUND First public disclosure of new chemical entities often takes place in patents, which makes them an important source of information. However, with an ever increasing number of patent applications, manual processing and curation on such a large scale becomes even more challenging. An alternative approach better suited for this large corpus of documents is the automated extraction of ch...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2012